Image Captioning with Context-Aware Auxiliary Guidance

نویسندگان

چکیده

Image captioning is a challenging computer vision task, which aims to generate natural language description of an image. Most recent researches follow the encoder-decoder framework depends heavily on previous generated words for current prediction. Such methods can not effectively take advantage future predicted information learn complete semantics. In this paper, we propose Context-Aware Auxiliary Guidance (CAAG) mechanism that guide model perceive global contexts. Upon model, CAAG performs semantic attention selectively concentrates useful predictions reproduce generation. To validate adaptability method, apply three popular captioners and our proposal achieves competitive performance Microsoft COCO image benchmark, e.g. 132.2 CIDEr-D score Karpathy split 130.7 (c40) official online evaluation server.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Image Captioning with Attention

In the past few years, neural networks have fueled dramatic advances in image classi cation. Emboldened, researchers are looking for more challenging applications for computer vision and arti cial intelligence systems. They seek not only to assign numerical labels to input data, but to describe the world in human terms. Image and video captioning is among the most popular applications in this t...

متن کامل

Context-Aware Image Compression

We describe a physics-based data compression method inspired by the photonic time stretch wherein information-rich portions of the data are dilated in a process that emulates the effect of group velocity dispersion on temporal signals. With this coding operation, the data can be downsampled at a lower rate than without it. In contrast to previous implementation of the warped stretch compression...

متن کامل

Image Captioning with Sparse Lstm

Long Short-Term Memory (LSTM) is widely used to solve sequence modeling problems, for example, image captioning. We found the LSTM cells are heavily redundant. We adopt network pruning to reduce the redundancy of LSTM and introduce sparsity as new regularization to reduce overfitting. We can achieve better performance than the dense baseline while reducing the total number of parameters in LSTM...

متن کامل

Correction: Context-Aware Image Compression

[This corrects the article DOI: 10.1371/journal.pone.0158201.].

متن کامل

Phrase-based Image Captioning

Generating a novel textual description of an image is an interesting problem that connects computer vision and natural language processing. In this paper, we present a simple model that is able to generate descriptive sentences given a sample image. This model has a strong focus on the syntax of the descriptions. We train a purely bilinear model that learns a metric between an image representat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i3.16361